DOCUBERT RESUSB : 


cs 003 373 


BD 161 795 
AOTHOR Pearson, Gregory 
TITLE Representing Meaning in Text. . a 
SPORS AGENCY National Inst. of Education (DHEW), Washington, : . 
° Q.C. 
-PUB DATE Apr 77 ~— ; : : 
GRANT , NIE-G-74-0018 ‘ _ 
ROTE 26p.; Paper presented at the Annual Meeting of the ‘ 
American Educational Research Association (New York, . o~ 
New York, April 1977) . 
EDRS PRICE SF-$0.83 HC-$2.06 Plus Postage. ’ 
DESCRIPTORS Computer Programs; .*Learning Processes; Logic; 
_ Memory; *Networks; Prosé; *Reading Research; Recall 
; (Psychological) ; Reliability; *Scoring; *Sesantics 
IDENTIPIERS *Prose Learning 
ABSTRACT ; ; a . 
. A propositional network system for representing? the 


logical and semantic informatidn contained in-a text is described. 

The reliability of scoring information recalled from reading, using 

this representational system, is found to vary with the scoring goal. 
Determination of the amount of information recalled is found to be 
extresely reliable. A coaputer isaplepentation is desctibed which 

compares the structure of ‘passages and that of the information 

recalled from thea to enable research on’the structure of content ) ‘ & 


acquired froa reading prose. (Author) . 
; 


SERESRESED SEEKER OR EERS ETHER EER EE ERRORS EERE EE ERE REE O SEER ERR EE ERE EE, 

Docurents acquired by ERIC include many inf ogmal ,unpublished * 
gaterials not available from other sources. ERIC/ makes every effort * . 
to obtain the best copy available. Nevertheless/ items of marginal * 


reproducibility are often encountered and this affects the quality * 
of the microfiche and hardcopy reproductions ERIC’ sakes available * 
via the ERIC Document Reproduction Service (EDRS). EDRS is not * 
responsible for the quality Of the original document. Reproductions * 
* 
* 


a 


supplied by. EDRS are thé best that can pe made fros the original. 
SESE SAESE AEE SEEES EEE SESE EE SEER ES EA EERE ES ESE SHEET EKSTRA REEEEE SESE & 


fo 


ary * 


EDI417.45 


“as cos 373. 


US OE PARTMENT OF HEALTH. 
EDUCATION & WELFARE 
NATIONAL INSTITUTE OF 

EDUCATION 


THIS DOCUMENT HAS BEEN REPRO- 
OUCEO EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION QRIGIN- 
ATING !T POINTS OF VIEW OR OPINIONS 
STRTED DO NOT NECESSARILY REPRE: * 
SENT OF FICIAL NATIONAL INSTITUTE OF 
EQUCATION POSITION Om POLICY 


‘ 


Representing Meaning’ in Text 


Gregory Peatson 


te 


. 


’ , 

Reading Research Group 

Department of Education 
Cornell University 


Ithaca, New York i853 


PERMIBSION TO REPROOUCE THIS COPY 


FIGHTED MATERIAL HAS BEEN GRANTED BY 


Gregory Pearson 


To #RIC AND ORGANIZATIONS OPERATING 
UNDER AGREEMENTS With THE NATIONAL IN. 
STITUTE OF EDUCATION FURTHER REPRO. 
DUCTION OUTSIOE THE ERIC SYSTEM AE 
QUIRES PERMISSION OF THE COPYRIGHT 
Owner 6 . 


Paper presented at the annual meeting of 


the’ American Educational Research Association, 


ae 


. April, 1977 


f ° 


‘The research reported,here was supported through grant number NIE-G-74- 
0018, Structure and Learning From Prose, from the National Institute of 


Education,“ awarded to Dr. George X- McConkie. 


4 


"2 


f 


Introduction 


~ the goal of -this paper is-to give an overview of the research which we have 
in progress on ‘the Yepresentation of nefning An text. It should be pointed ‘out 


right at the start that the thrust‘of the work which will be reported here has 


< 
”» . 


‘* to date been primarily methodological in nature. Our current effort has been 


directed towards developing two ecubonkane Re a tool for use in paychological 


research. The, first of these. component 8 is a mokat toned systen of sufficient © , 
generality ahd detail to represent a » tania ty ita range of semantic informafion 


‘can ‘Sueeatte structure which we, take to be a representation of the meaning of a 


- . « 


text at a level somewhat deeper than the surface Structure. language represen- 


4° £ 
elton,» The second part of this qool: is an accurate ren scor 8 pro- 
a : ; 


7 cedure, Here we take ‘soring eh al to mean a map’ B, a, way to go from the 
surface form of & text toa deeper, level ‘semantic reprefentation of “the text's 


meaning of the sort just mentioned. We have temporarti set aside questions 
: a ty 


concerning some theoretical matters such as the status/of the deeper level 
semantic representation asa psychological model of human memory representation 


and questions as to the relationship of the ee system and scoring pro- : 
' @ f i 
; | : : 
cedure to various linguistic approaches. to looking at! language. In this research 
~ , : f 


we have taken as our starting point the work on text meaning. representation of: ' 


Carl Frederiksen as we felt But the notational seid % developed by him had good ao 


( 


pordattad for meeting the seiigneite stated above, aaeply those of generality and : 
detail, so as to be useful for the purposes ‘of psychological research. re the ae 
course of this paper I will describe the constituent stanton wf Foedert sen's 

pation in, order to ‘communicate a sense of this system's generality, that is, the ; 

kinds .of semantic informa cathe it vepuetents. and I will present ° art of his 

systen more fully : as ¥n example of ea detailed semantic distinctions which are 


. made in his notation. ollowing this I will give —e of some: difficulties . 


we have experienced in using Frederiksen's systeniin its most davatiag’ form and , 


| to | pees 

' * 4 
will present the kinds of modifications we have made in arriving at the system ~ 
of notation used in vid eunceinniedl work, reported by Lucas ‘(1977), Dee-larcas 5 
. (1977) and, Smith (1977). The-reliability of our modified version of Frederiksen, 

‘, » as used ih these experiments, has been the subject of some preliminary investi- 
gation and the results from this are Siaiaiaiad below along with several scoring 


guidelines which constitute our beginning attempts to improve scoring reliability 


and in the process to specify the relationship between surface text form and 


t 


ae underlying semantic representation. Next I-wi]l point out thewnatural corre- 
. | spondence which exists between the basic data structuré of Fredertkeen's/notationsl 
{ x . & 
I" system, the semantic network, and some computer data structures yrich has led us 


| to take advantage of a ninicedputer to store our semantic content and semantic 
1 ‘ 


structure representations ‘of subject recall texts. The last section of this 


’ 


| 
| 


paper will be devoted to some descriptive coments about the computer programs 


which we currently have available for manipulating these stored recall semantic 


structu S Finally, I-wtll mention some directions which we plan to take in ’ 


sing: the computer tb discover characteristics of the structure of subject 


Mecall’ texts. 


, which provides all. the distinctions and notational devices needed to adequately 
; | 
capture the aeaen tS and logical structure of meaning in text. Because of this 
thbre is a depth ‘of ldetast in his article which makes‘it impossible for me x6 
a 


do justice to his systea in the cancels overview which follows. Thiamy goal 


here is only to characterize his work to an extent which makes the ae 


|framework of our own research ‘clear, In the process of vee I will 


somewhat interpret Frederiksen's article at a few poin 


. 7 y 


Frederiksen (1975) describes his notational sy srem in terms of netwprks, 


i sentation. 


. AERA - 3 


; _ Xt ; 
where a network is of the usual sort,, that is, a set of nodes connected by labeled 
arcs. In his semantic nétworks (as opposed to logical networks, which are left 
for later), the nodes are filled by semantic "concepts" and the arc joining two 
. ¢f 


nodes is a relation, the nature of which is specified by the arc label, Figure 1 


shows a simple semantic network and a linear propositional notation we have been 


using to |represent it. 


P1 (<BILD)—AGT@TEM (PAST).-> (* BREAK) -~OBI1+> (/WINDOW) 
, \ : satis /--TEM14> ('YESTERDAY) 
P2 (:WINDOW)--DEF->("THE)--NUM@-> ("1) + | 
iw ay, . 
"BILL BROKE THE WINDOW' : , : 
‘ S . ; 


‘ rk with possible text sentence and‘ linear repre- 


’ 
% 


Two concepts and a lab 


which makes up a semantic network. The taxonomy of thes 


shown in Figure! >. is 


objects 
‘ 
- actions “i 


network 


Aterkboken| pS 


., : ° - : case | Oa 
i a) , : f : | 
* relations —_—_2——_—— 
. ~ % . ‘ 
ms if . identification > 


Figure 2. Taxongmy of semantic network elements. 


4 


* 4 AERA = 4° 


\ 


Objects are "things which euesey space" Such as ‘animal’, ‘movie',’ 'wind', 
"book", and "rock' (where the single-quoted lexical items are taken to indicate 
the semantic concepts). Actions are "things which occupy a position or interval 
of time and involve change” such as "break',’ ‘write (something)', "think (of 
something)", ‘breathe’, "ride (a horse)', 'play (baseball)", "play (with some- 
one)", "know (something)', and 'see (eoriethtng)*. The class of attributes is 


not explicitly defined in Frederiksen (1975) eines it is clear that te con- 


tains things like 'red' nd ‘gions 


rn 
Identifying relations are defined “by Eraderibsien (1975) as relations which 


* 
identify an object, or. action (or cies of ahiéers or actions) thus serving to 


distinguishg the object or action (or classes thereof) from.others of the same 
type. Case relations, which "specify a causal. system involving an action," 
are somewhat different than identifying relations as bh be pointed out below. 


Pe Tha various kinds of concepts) combine veh Savant ticnttan relations to 


define s weet systems" » each of which specifies a different type of semantic 
“d ‘ 
information. “Figure 3 lists the five systems iui defined by the fe types of 


| . ‘ ; 
rglations along with the concept types associated and the semantic informatign 


‘ 


specified. * ee 


os INSERT FIGURE 3 ABOUT HERE 


| The five aici ar imac systems and the h@retofore unused case relations 


combine(at :a next higher level of analysi “to make up semantic network propo- 


sitions. 


° . 


| Prppositions are defined as representing either a state or an event. A 


. 


| | 
state is ‘gn object (or object class) along with all of the identificational 
information represented by triples from any of the systems in Table 1 which 


distinguishes the object (or object class) from other classes of objects. ‘ 


‘ Thus, a.stative cee toa, an example of which follows, represents a state 


@ distinguished. \ 


® 


- AERA ~. 5 
. * + : ‘. = 
f ; (‘the man's hair was vane red in the sun 

yesterday. ') 

FS HAIR) <-HASP--(: MAN) | | 
; He _  /--ATT->("RED)-+DEG- ('VERY) poe 4 
i ‘ /--LOC@P (' IN) -> (SUN) 
/--TEM-> ('YESTERDAY) 
‘ | 


Note that this example stative proposition contains triples from only four of: the’ 


‘ 


five systems since manner only’ can apply when an action is involved. 
An action, along with its obligatory and optional case relations and any 


“identifying relations, constitutes an event. As with the stative proposition 


type above, an,event is represented by an event proposition: 
~“N : e - 
('The man dashed along the path from the 
door to the gate at-10 AM") o 
. é P4 (:MAN)-~AGT-> ('DASH)--DAT1~ (:MAN) ra 
/—SOURCE->[P5 ] , J 
ts Ee /—RESULT->( P61 ~~ : 
, /-=TEM1=>('10.AM, ()) 
: . * {--LOC1,1->7(':PATH) 


, ; P5 ( :MAN) LOC, > (: PATH), 


P6 (%MAN)~-LOC®,#->(:GATE) 


In this example, AGT, DAT1, SOURCE, and RESULT are case relations and 
. . ] 


TEML (from the temporal system) and LOC1,1 (from the locative system) are iden- 
Ps : of 
tifying relations. 


In addition to the semantic network: just described, Frederiksen talks about 


‘a logical network which is similar to a semantic network in form. Instead of 
concepts filling the network nodes, however, it is propositions that occur. in 


. these slots. ‘The logical relation which can label the arc between two nodes 


can be one of three types: ; : i tool 
‘ , ; logical 
logical 


relations — causal 


algebraic 


A , . . 7 F o . bs 2 


aad aie ha AERA ~ 6 


Just as in the semantic network, these relatidnat types define systems which 


specify different fypes of logical information. These’ systems are summarized 


in Figure/\. - 


INSERT FIGURE 4 ABOUT HERE 


+ 


as Pa ae y ' 


Since Iegical networks have not ‘been ustd te a' large extent in our current: re- 


; | : 2K : a, 
search, I will say. no more about them here, < 


aso 


A Modified Network Representation — 


‘ 


One. of ‘the “—. of Frederiksen's notation is the great amount of detail\: 


the ayeces is potentially capable of laa This ‘is a result of the’large ° 


oe 


‘i number: of fine distinctions made. .in the taxonony of concepts and the classifi- 
cation of relations which can co-occur with these various concept classes. The 
os i ie classification for actions, for exatipl > in me, in se aa 5 (this is - 
. Frederiksen’ 8 (1975) figure 2x. Note that by making bantoatly ‘four distinctions \ 
’ (+ hits. £ physical, t phene t siuaiad. nine-classes of actione are defined. 


Piguat 6 presents- “these four _— ina systemi etwork to enpiasine ‘the 


dines systems. involved : in the hierarchical tree diesen of Frederiksen (fig. 5). 


‘\ 
Let 1 us consider only the resultive/processive action Stevinettas for the 
¢ purposes of this discussion. It was noted above that actions are characteris- 
tically associated with a set of obligatory and optional case relations, called : 
. - ‘ 
¥e "a case system. The’ case system for. resultive actions has the following network 
. ‘ , : 
4 ‘configuration: | ~ * ‘ se 
- ’ 
0 5 . 2 - ' 
(animate object) *-AGT+ (resultive action) OBJ1> (inanimate object) 
-or- DAT1-> (animate object) 
(inanimate objéct)*-IgAGT-> SOURCE [prep] * 
. . RESULT> [prop] * - 
“oo . INST> (object) sy 
bf GOAL1+ [prop] , : 
; ? ° 4 4 
‘ * 6bligatory ; ' 
*k one of two is obligatory : . rib 


(1£ I-AGT is present, GOAL] cannot be ‘present) “ 


x 
4 . » 


8 


The corresponding case system for, processive actions is: . : id 


. 


: 3 ‘ ‘4 


‘(animate object)*-PAT- > (PROCESS) 0BJ2-> (inanimate object) é 
. -jor- . DAT2-7 (animate object) ak 
" (inanimate object )*~I- -PAT> THEME2-7 [prop] *** 
GOAL2-7 [prop 
| & rer] 
*obligatory = ‘ i ote te 
*kone pf two obligatory if process is relative : 
xkkobligatory if process'is cognitive . 


‘(if I-ACT‘is present, THEME2: and GOAL2 cannot be present) 


/ 
/ 


My point here is not to explain the operation -of the case system but rather I 
* 
would like to show that the classification ofa particular action as processive ‘ 


or resultive has a great effect on the concepts and relations used. in repre-_ = ‘ 


senting an event associated with the action. 


The defining: features Speed by. sia a acta in making his taxonomic dis- 


tinction, (t result] in the case under diseuniiion; are segenied as ee eee 
evinditves in his system. As a result, 4f there is disagreement, there is no iy 


v - 
set way for deciding when a giv eature is present or absent. For an action, ° 


then, this means that it is not always clear whether the action is processive or 


resultive. Consequently, we don't know whether the PAT/ I-PAT or AGT/I-AGT : 


ec 


relation should be used to represent the relationsh Nacwnen, an action and its our 


/ 
“immediaté Lause. For.example, ‘breathe’ is given & Frederiksen’ As a processive 
‘action, one which does not produce a change in a, state or other process. It 


Pd 


would ‘seem, however, that breathing involves ghanging unoxygenated blood into 
‘oxygenated blood and eed looks like a si in a state, atleast to some 
people. Again, the swine here is not to fet into ae detailed workings of 
Frederiksen’ s semantic primitives but rather to Setar out that, in practice, 
5 . one's intuition about semant ic priniyives is often an unreliable guide. It 

may well be the case that empirical eich’, or detailed Linguistic investi- 
’ . gation is needed before a particular action can be confidently marked as + pe - « 


[result] . r . 


Py 


” 


* 


~ AERA ~ 8 ,., 


Our-general solution to this kind of problem has been to collapse distinc- 


‘tions which we have trouble applying. Our current. text méaning representations « 


do not keep the processive/resultive distinction and this te have collapsed the 
PAT and ACT relations into‘ one (AGT). Note that while this results, in a 
"modified" ‘ctnaiheine. eis resultant network is not different in kind. ‘The 
basic elements and seeuccuee are still the sdme; only de number of. ieeiona 
changes, The distinctions, and along with them the finer relation specifications, 
can easily be recovered if “such itavicnvicna prove necessary to account for 
effects observed, in our expe¥imental data. 

I will briefly mention a’ second way in which we have simplified Frederik- 


sen's notation. We have chosen to abbreviate Frederiksen's detailed represen- 


‘ tation in a number of systems by using surface text lexical items as relation 


. 


modifiers. This\ is much as Kintsch (1974) has done Yin handling similar relations. 


Frederiksen's locative. system, for example, contains some @Leyen relations which % 


specify spatial position. One of these is the locative relation 'LOC1,1 which 
appears in the network for the example sentence 'the man dashed along the 


path...' given above. This relational label LOC1,1 specifies that the action oy 


~ . ne @ ’ 


‘dash’ is hoeaved as a one-dimensional path’ in a one-dimensional field. The 
notation ('dash) - LOC1,1 > (:path) is tate és represent the locative meaning 

of the surface text form "dash along the path.' .Locative propositions in . 
English do not necessarily convey Locative meaning which can so easily be 
represented.” Often the Tieaieton relations are quite complex. To avoid /~ 
having to anetd out all these sohblanities we have used just one locative re- 
lation, LOC, and have allowed the surface téxt preposition to modify it. In 

this nnateied form ‘dash along the path’ would be represented by ('dash) - 


LOC@P('along)-> (:path), that is, there is a locative relation LOC- between 'dash' 


and 'path' which is further specified as a position P of the type (‘along'). 


. 
Note once again that this kind of modification retains the basic form of 


> 


ae iva wm 


F 


. 


‘ Frederiksen’ fully detailed system and We assume’ that out abbreviated notati 
° | { ] ate . : 


could readily be expanded into the more detailéd form if, this wa necessary fpr 
the purposes of our psychological research. , 
wo 


4 


4 : | ; “AERA - 10 
Scoring’ Recalls ; F 


Earlier in this paper I defined “scoring procedure" to be the mapping which 


. 


exists between the ysurface deeieuke of a'text and a deeper semantic level which $ 
we are representing in the manner Heeuened shies Here I would te to talk 
about another kind of scoring, namely that activity wish a parece performs 
wheni:comparing a recall protocol to the semantic representation of the original Z 
or target text. In this case, the scorer is faced with a task which asks hiv to 
intuitively ascertain what semantic, information is contained in the recall protocol 
and to decide if cise same semantic information {s to be Wind. de the target : 
semantic peprentatavion,’. A more riaesis way for the aeaper to diecael would 
be. for him to generate a unique semantic representation of the recall and then 

’ compare the target and recall-semantic representations. ‘Moreover, the ideal 
scoring would occur if we had a deterministic sical ie for generating the 
underlying semantic representations ‘which are to be compared. This deterministic 
procedure would be, in essence, a recognition "machine" or grammar and it would 
fully specify the kind of. mapping mentioned above, id.e., the one which exists 


. 


between surface stenckune and ‘deep semantic structure. igs and SveEEOAS: else 

I believe, are far from the ideal state and conmentiaels our deovies of eneniia 

has been of the first sort where a scorer is presented with an explicit 

senantic representation of the niet text and asked to check the elements of it 

whith he judges to be contained in the recall protocol, based on the surface 

structure text of the recall. . 

To test ina veaitateney way whether this vais nt scoring is a reasonable Da 

ee approach productive of stable data, we undertook to determine interscorer 
reliability on a set of five DO a protocols. Three of these protocols were 
taken from the data collected by Smith (1977), one immediate Repaid, one 


t three-week delayed and one three-month dxievad, In talile 1 these are recalls 


Al, A2A, and A7B, deapentively. Two other immediate recalls, C103 and D201, 


> 


: \ | 12 ae 


. ; a . . + ® Py . . AERA - t4. , 


were chosen, one from Dee-Lucas (1977) and one from Lucas (1977), irrespective 


of experimental condition. Since werwere interested in seg a conservative 


. evaluation, of our scoring, recalls were selected which fended to exemplify 


‘surface text characteristics which’ we ehousht would cause’ problems for scoring. 


a 


For exanple, the three-month delayed recall ATB yes marked’ by ambiguity of . 
reference and paraphrased information related to mitiple agpects of the target 
°F passage,. 7 ’ , . . zoe? a . : 7%, ’ 

_ The five recall prdtocols vere each — by three scorérs iG the annes 
described above. In addition to ) checking as recalled elements in the target 
semant tc representation, the scorers were stowed to record element substitutions, @ 
when items in the. recalf “protocol were taken in context. to be functionally ; 

, equivalent to an element in the target. For canta: ‘send! ina recall could 
take the place of ae Ra the target in a context suc eee . 


(:PETER)~-AGT-> (' COMMISSION) —OBJ-9 (: BERING) 
; : i /--GOAL-> [.,. sf : 


Equivalent substitutions were treated as checks. No strict Peon oe were , 
anuctPied and although the scorers were familiar with she semantic reptesen- 
tation system used, cthere was no training for consistency, 
i Our results indicate Ehat the reliability of this intuitive scoring varies 
with respect ic ‘the level at analysis, When. “codnidecinn the amount of infor- 

P inion eeceitee ‘in the eeotocol, that is, _ number of elements checked tes 
the target representation, the reliability’ was extremely high: (r=. 99). When 

“ considering the. okie of informatiod recalled the reliability varies —s the 


ee ‘level. of representation, gine is, with the number of categories (for epepebintene 


* 
¢ 


m. 84, for aeannte r= 78). 

Since we are primarily interested in research at‘the element level, we 
further dadexeants to discover the kinds of oxebiae in scoring which cause : 
‘variability ‘at this level" From an analysis of the a ee data, we identified 


‘four areas of variability which we have labeled as an 
¢ a 


“a oe ae ; 


Vig. 
eo 2 ff Py « 7 o % 


‘ 


: absent,. while an omission is ust the opposite, the scorer fails to check some- 


e i. scorer érror (ER) 
2. scorér omission (0M) t 2 
3. idiosyncratic technique (IT) 


4. lack of best fit (BF) 


7 


The first two of these areas, ‘error’ and ‘omission’, are in a sense 


similar but we have pin ica them here for the purpose of determining their 


individual effect on reliability. ‘Here an error is taken to ‘be an instance 


where the scorer checks as\pr sent in the, recall a target element which is clearly 


» 


AERA ~ 12 


- 


thing clearly present. I use the term 'clearly' herg,t to indicate that an extremely 


conservative criterion has been wed to assign inter-scorer discrepancies to — 


of these er Discrepancies which were at all problematic were deft’ as 


unresolved, variability. ‘ 


a. ae 


Category three, which was called "idiosyncratic technique’ for lack of 5 


. better ailiie actually contains only one way of" checking a to be relativel 


Frequent ‘is the scoring of one scorer, 


In some cases ahat appears in a recall 


as an “equivalent scbneteucion can be seen as having a source in a different 


target proposition. 


for 'Seward' in the following context: 


P7 + (:SEWARD)—-AGT> (‘BUY)--OBJ-» ( : ALASKA) ae 


one has sort of a ‘reference chain' which one scorer attempted tp capture by 


placing a check in P8 above. 


The final category, ‘lack of best fit’, is somewhat more complex. 
* sometimes the case that the scorers. seem cones in theit ‘identification of a 
ptece of .the surface structure of the recall Sack which can be watched ‘to the 


target semantic sSenouundalteg but end up recording shite in different” places in 


ty 


14 


P8 sas il natal 


¢ 


e 


. 


- 


For example, if "Secretary of State’ is found subaticuted 


Ie is 


i 


ry 
. 


eo 


: aes when the surface structure of the targ Take 
\ 


e ? ; > 4 


(1) They “Bindbasan lived both on the southe Hs 
. © region. 
— 

(2) Indfans lived in the south east, and mid 


Ra part of ated. 


2 


a . 
"yigute 7 shows the. varied scoring of this case. 


the head noun of two noun phrases, ‘southeast (pardl' 


two takes the two noun phrases to be ‘southeast’ 
tae ; 


as etna and 


ing wartabifity even though 
pcalled. . There’ is, however, 


evi thaps southeast’ ‘has been 


substitution) and that ~ 
"middle part of'.is simply an expanded form of .'ihterior’. "Area" is then left 


as a substitution for 'region’. The 'best fit! 


. 


guideline’ which ‘comes out of 


‘this mt ediate recalls, tend to 


eflects the fact that recalls, especially 


sihieek the surface form of the target passage and that scoring should try t6 


maximize this verbatim, in this case, structurally "verbatim" aspect as a way 


of resolving ambiguity. ‘ ; f | an 
A second type of: ‘case is:also included’ os the ae ie! category. As 
an example, examine the following target (3) it recall (4) sentences which 
. : we = 4 


were scored ae shown in Figure 8 . 


’ 
* 


(3) They -[Bering and his men] discovered St. Lawrence Island and sailed 
through the Bering Strait. oe ig ‘ . 
5; 


. . . MS a 


therp is a sense in which "sail. through’ is like ‘discover’ in the. target passage e 


: give rimacy to the ‘discover’ - 'find' similarity, while scorers two and three MN, 


not fdel that we have ohe to offer'at this time. As a preliminary step we have 


* Placement in P252 requires only two substitutions ('find' for ‘sail’ or, if you 


would, iat through’ ) since ‘explorer’ is cléar as referentially ‘Bering’ from 


the context of the recall Protocol. We realize’ that this sort of count ing is 


‘ AERA - 14°. 
(4) The explorer [Bering] found the Bering Strait. ; : 
. i a) oe e “Se 1% 
ee INSERT FIGURE 8 ABOUT HERE eo. wk lo 
; > .- at ae 


s : mr 


1, there appears to ba agebtaase about the information to be scored. Here 


and it is this that i8 reflected in the recall protocol. Scorer one appears to \ 


seem to focus on the. "Bering Strait" identity. _It.wotld be nice to-have a ¢lear- ‘ -* 4 


» 


cut yt siddtetve how to ‘Snteara options like this but, unfortunately, T ge 


used aj simple count of the number of equivalent substitutions involved and their 


- 


deen of similarity. In this case, for example, placing the information in 
P251 requires thrge substitutions, one of whith can be considered quite 


dissimilar ("Bering Strait’ for 'St. Lawrence Island') or even non-equivalent: 


ad Hoc a it will be interest ing to investigate just how far such a ainpie 
\ m ets 
metric an take us in reducing scorer variability. . ; 


In ‘ to ‘see what effect the variability in each of these four categories 


had on the inter-scorer correlation for our sample data, we transformed che 


\ 
scored protocols by removing chene kinds of variability, category by cialis: ' . 


At each ocags we recalculated the inter-scorer correlation and as a rough 

estimate af ithe amount e variability accounted for by each transformation, we 
squared the correlation subtracted that of the previous level to arrive at 
a differencefigure for each level. The results from this process are presented 


in Table 1.- 


. 


xf 


| AERA - '15 


INSERT TABLE.1 ABOUT, HERE *- oe 
a te — x *. 
is Tt ‘is interest tng to note that the largest amount of WARIAUELEEY some 2, 


‘is due to scorer errors or omissions ‘asl that ony 1 is attributable to the 


4 : notation ajfficulties -which we have identified here. We pncinds ceiaaa this 


¢ 


that the, most aC ana cane gains can be made by providing some sco _ aids or 


4 


procedure to help'the scorer simply, be accurate. The overall level: of correlation 


‘ (r#.924), or even the correlation (r=,884) ‘at’ level two, is relatively high con- 


wIdeT IDS | that it represents the strictest form of elenent-by-element reliability. 
It would appear from this that ‘seoring of the type described’ here can result ‘in / 
data of relatively- high at abilicy epaigering the level of analysis, 

One further comment should p be made haaed on the data in Table 1, Recall’ 
protocol ATB is notable in th i it diverges considerably from the others in 
scorer reliability. While itjis Hee draw much of a conclusion from the 

b 


evidence of one case, it may foe: a that long-delayed recalls (three-months 


here) are chaactertet cally more difficult to score reliably hence making re- 
| . ‘ 


’ \. 
sence in this area more problematic. 


i ¥ 
‘ : set 4 
Computer Implementation 4 3 : 


ue cateqory-by-catenory franaformat fon mentioned -in the preceding weil 


is -—_ one kind of semantic network eeelares which is greatly a ded by d 
computer software systen hae we have implemented on our PDP~11/40 aboratory 
computer. The basic core of Hite Ontucanten aie Provides. for yasious eesors 


output functions, editing ath input semantic. structures, .and scoring. At present 


data analysis functions are limited to 1-to-1 dontnactann of structures, corre- 


. 


lations, and data summaries of the kind shown in Figurés 7 and 8. T is software 


is written in a list processing extension to-FORTRAN called RT-11-SLIP which has 
been implemented on our machine specifically for this prose analysis system. 
. ’ : ° ; 
Although there is a convenient relationship between a network, in| this case 


’ e a £7 3 ar 


/ “ - AERA ~ 16 / 
; i 5 / 
our semantic network, and the computer data structure kfown asa list structure / 


. *. ° a . . 
3 (hence the RT=-11 SLIP implementation), I will devote the, remainder of this ’ jf 


hf | 


Section to a discussionf@fgraphs.- The reason for this is that our current 
. . / 
computer dévelopment work is aimed at implementing the software necessary for 


us to employ algoritms on graphs in our attempt to characterize the semantit ~ 
. a ae ' . / 
‘ structure of texts. - , ; / 
, . 


As an example, let us consider the semantic network in Figure 9a. / 


INSERT FIGURE 9 ABOUT HERE 


a ae 


. This semantic network, either in ite linear representation or its node-and- ° 


ig “are form, is a kind of graph, namely a Labeled network. ' graph 6k this sort 
has names entails identifying the nodes (1, 2, ...n) as/well as Sabele ansociated 
with the andes (*iodian’ for node 1, 'live' for node 2, uate. In addition the 
arcs betwaen nodes have weight associated with them (AGT, LOC, etc.). Further, . . 
a digraph can be completely specified by a square matrix which has as many 
dimensions as there are unique tries o* _unveighted matrix for 9a, called 
‘simply the adjacency matrix for Sa, ie’ shown in 9b. If an arc exists from node 
1 to node 2, then there is al in matrix cell 1,2, if there ts is an arc from 1 to 
3, then there is ~ 1 in cell 1, 3, etc., otherwise the cell is zero. There are 
other kinds of matrices asecctiged eit woenenek. 9a but I mention only this one 
since it is a simple example of how a graph operation produces what may be an 
interesting result. ; If We square fhe adjacency matrix, for example, then the . 
value in some cell Ysy in the resultant matrix is the total number ue distinct - 2 
2 aequmnites, or paths, from node i to node j that have have length 2. Further 
powers of-the matrix give ‘te paths of lengths 3;4...n wuts) all the cells 
become zero. “This may wees Sika a difficult way of discovering paths and their 


’ 


lengths but this’is only so for a semantic. network as simple as that in Figure 9a. 


. ‘ ~ 


. », df 


Even a short and relatively simple text has a semantic network of such complexity 


st ( ~t 


that it makes the “hand computation of paths and pavli lengths prohibitive in 


“4 yf —~ 
terms of time. J: Uf ; ° 7 


There dre, of course, more ‘complex ‘algorithms oh graphs, such as-those deal- 


ing with connectivity, and it is thése which. really require the power of a 
computing machine. We are hopeful that the application of these abgorithms will 
ve ~- 
be of use in sa cabana thé structure of meaniag ‘dn text., re 


? . 
- he c 
x. 


, . 
References a3 eg ; At 


' Aho; A.V.,- Hopcroft, J.E., & Ullman, J.D. The Design. and Andlysis of Computer 
Algorithns. regen Addison-Wesley, 1976. . 


Berztiss, A.T. Data Structures, Theory and Practise, (2nd ed. ) New 
Academic Press, 1975. ‘ : 


Chafe, W.L. Meaning and the Structure of Language.” chtcaso: the Univ pase of. 
4 wnaenee Press, 1970. 


meeting of the American Educational Resez rch Association, New York, / April 


i977. oa “+ ” re. . 


a“ 


Hehbidis D. The effects ‘of ncn 8 ate Paper, presented at the ahnual 


Fillmore, C.J. The case for case. In E. Bach and R. ‘Harms * ‘(eds.). Universals 
in in binguistic Theory. New York? ‘Holt, Rinehart & Winston, 1968. 

Piedgrikeen,. C.H. "Representing Logical and denaavle Structure of Knowledge 
Aéquired from Discourse." Cognitive Poveholog., ue LOTS. 


v 


ts. R.A. English Sonples Sentences. Amsterdam: North-Holland, 1971. 


.Kintsch, W. . Willsdale, N.J.: 
L.. E. Erlbaum “Keoociates, ‘1974. aN 


presented at the anpual meeting of the American Educational Research 
Swi New York, April, 1977. = . ‘ 


Lucas, P.A. Anticipatian of test format: rome effects on retention. Paper 


8 
Pearson, G. 
eS. (Research Report no. 7). 
Education, December, 1976. 


s 
Smith, H. ‘Memory ovey varying J intervals « of time. Paper presented at the annual 
meeting of the American Educational “Research primer New York, April 


: 1977, ’ ' : 
ee “ae 


en 8 | ’ 
4 4 . . 
TABLE 1 - INTER-SCORER RELIABILITY 
% (CORRELATION COEFFICIENTS) ; ‘ 


’ VARIABILITY CATEGORY 


Initial ER 


RECALL PROTOCOL 
. 
BN 


C103 
D201 
MEAN 
CORR. 
\* | 
_ DIPFERENCE* 


* 


\? " 
*Difference: increment in variability accounted for by eliminating 
each source of ‘unreliability. 
$ ; : 
é » 


‘ 


AERA - 19 


RELATIONS CONCEPTS 


—_ 


SEMANTIC INFORMATION 


a 


temporal , ‘ objects absolute,time, elapsed time 
: : or duration ‘ 
locative : objects, location at point, on path 
actions or in region -— 
. : € ‘ F ’ e 
stative objects part-whole, classification, ; 
, ‘ ; ‘ attribution, symbolic content,’ 
an : ; : s * determination, quantification 
mamer .. actions classification, attribution 
degree . lS  ® _attributes _ extent 
’‘ Figure 3, Identificational semantic systems. . ras 


¢ 


"RELATIONS 


LOGICAL ‘INFORMATION’ 


logical conjunction, disjunction, implication, : 
.negation pane wig 
causal cause 
e. s : v ‘ 
algebraic equivalence, relative locatfon, relative : 
(relative system) time F : 


Figure.4, Logical systems. 


. 
’ 


PHYSICAL 
[+phystcal] 


se 


r% : RESULTIVE 


NONSYMBOLIC 
RESULT (1) 
Ctheme] 


SYMBOLIC 


“ACTIONS fehers] _ 2) 
~ Fresult] >} ; 
. a COGNITIVE (3) ‘ 
° . [+themel 
; physical] - 
: * 
SIMPLE ((4) 
4, “ 
". >— NONCOGNITIVE e 
e theme] ag 
‘ © * “ 
_RELATIVE (5) “ 
: i, PHYSICAL ; 
. Ephysical] 
SIMPLE (6) 
PROCESSES — COGNIT 
° Eresult] _” theme] 
< + 
: a , TIVE (7) 
ae. . 
rae SIMPLE (8) 
y hy COGNITIVE 
- ; [+theme] 
~E physical] RELATIVE (9) 
* 
: : a . 
Figure 5; Action hierarchy. Examples» (1) break, give, go (somewhere), kill, i 


- build, make (something), walk (somewhere); (2) write (something), draw (something) , 
compose (something), ask (a question), say (something); (3) think (of something), 
imagine (something), learn (something), remember (something); (4) breathe, sleep, 
walk, dance, burn, blow, grow; (5) ride (a horse), drive (a car), walk (the dog), 

_burn (the wood), blow (the sand); (6) play (baseball), act (a part); (7) play 

(with someone); (8) know (sohething), experience (a feeling), feel (happy), 

believe (something); (9) see (something), understand (someone), feel (an object), . 

like aaa something). , ' 


Ze 2 
a) 


_ ee a" AERA ~ 21 


COGNITIVE “s 


RESULTIV ‘ 
tresult * ~physical , 
; SYMBOLIC 
* 2 +theme a 
PHYSICAL 
+physical 
PROCESSIVE - 
~-result . tsimple ~* 4} NON- 
. i‘ : . SYMBOLIC 
, ~theme — 
md . RELATIVE ; 
‘ 3 ~simple ae oS 
£ J ota 


* é . , 
Figure 6. SPerenic network renrecsntatizn of Frederiksen's action taxonomy. The 
curly bracket indicates "and'; straight bracket8 represent ‘or’. Thys processive 
. actions are both +simple or -simple and +physical or -physical., 
‘ i 


i an . : ee | ; 


= se Cee = 


moe ‘ AERA - 22 


Reference prop no.. P171 
Reference sentence: _ 
1.7 They lived both on the southeast coast and in the interior ren 

Recall sentences for 1: 
31.5’ The Indians lived in the south east & middle part of ‘area. 
Recall sentences for ‘2: . NG : 

31.5 The Indians lived in the south east & middle part of area. 
Recall sentences ¥or 3: be o> 

31.5 The Indians lived in the south east & middle pact of area. 


(:INDIAN) (: INDIAN) ' G INDIAN) © (: INDIAN): - 
AGT : ‘* RT . AGT AGT 
(‘LIVE) ss ' -* ("LIVE) % ("LIVE) (‘LIVE) 
TEM . + TEM: _ . TEM TEM: ° 
(PAST) ~ . _ -* (PAST) — (PAST) Si (PAST) 
(‘LIVE) . .* “("LIVE) + (‘LIVE) () 
"woe." a LOC Loc oa NONE 
(:COAST. 1) . ¢ GPART.1) i (:SOUTHEAST) ‘3 
Poet: P P , P 
(ON) : 4. Cai. 2 2 Cu FF FF. £3 
(!LIVE) __ : ; ("LIVE) (‘LIVE) ("LIVE) 
LOC - os . LOC : LOC - LOC 
(:REGION) . G:PART.2) ’ (:PART) (: AREA) 
2 e. R4 «- P P 

("IN) : ("IN) ("IN) » (TIN) 


Refererice prop no. P172 , 
_ Reference sentence: 
31.7 They lived ‘both on the southeast coast and in the nteredor region. 
Recall sentences for'l: . hd, ae 
» 31.5 The Indians iived in‘the south east & middle part of area. 
‘Recall sentences for 3: ° : 
31.5 The Indians lived in the south east & middle part of. area. 


(:COAST. 1) (:RART.1) | ; (:AREA) 
ATT) | ATT 5 ATT ‘ 
("SOUTHEAST ) ("SOUTHEAST ) ; ; ’ ("SOUTHEAST) 


Reference prop no. P173 


‘Reference sentence: : . = 
31.7 They lived both on the doisthexet coast and in the interior region. 
Recall sentences for 1: +. 


~ 31.5 The Indians “lived in the south east & middle part “of area. 
Recall sentences for 2: 
%% 1.5 The Indians lived in the Sout east & middle part of area, 
Recall sentences for 3: oN 


31.5 The Indians lived in the southeast & middle part of area. . -. + 
(:REGION). - | f”  (sPaRT.2)— (:PART) ¢3.. (: AREA) 
("Interior) _» (MIDDLE) . 7 (‘MIDDLE) ("MIDDLE 


Figure 7.. Scoring variability for BF case 1. Column 1 on the left is the target 
structure. Scorers 1-3 are shown in columns 2-4 respectively. . 24 


© 


¢ 


("DISCOVER) ; ns 
a: ‘ NONE ? y 
(PAST) () r - ; 


Reference prop no. P251 
Reference sentence: % 
32.5 They discovered Saint Lawrence Island, now part of Alaska, and 
’ ;sailed through the Bering Strait between Asia and North America. 
Recall sentences for 1: 
32. 3 The explorer found the Bering straight, but couldn't see the 


3N. American land mass because of fog. ‘ . 

(: BERING) . . , (: BERING) 

AGT : ; AGT ‘ eo * 4 

("DISCOVER) *  ("FOUND) ” eu i. 
TEM ; TEM 

(PAST) . . (PAST) 
* (:MAN) () . i : 

“AGT * ; NONE 


ii ee 


("DISCOVER) ..'«.- "> (FOUND) - sti ¢ 
OBJ oer OBJ gt 3 a, A 
(:ST. LAWRENCE. ISLAND)’ (:BERING. STRAIGHT. ) : 


Reference prop no. P252 

Reference sentence: 
32.5 They discovered Saint Lawrence Island, now part of Alaska, and 

waitin’ through the Mring Strait between Asia aay North America. 

Recall sentences for 2: er ' 

32.3 The explorer found the Bering straight, but couldn't see the ‘ 
3N. American land mass because of fog. 

Recall sentences for 3: 

= 3 The explorer found the Bering straight, but couldn't see the 


) 1 


.- American land mass because of fog: ; 2 : - ¥ ¥ 
5 BERING) (: EXPLORER) (: EXPLORER) 
AGT : AGT 7 AGT 
("SAIL) (" FIND) * ("FIND)” 
TEM TEM TEM 
(PAST) gf (PAST) (PAST) 
(:MAN) (). Co 4 
AGT bd , ; NONE . NONE 
("SAIL) . — - ; () () 
TEM Ae NONE NONE 
(PAST) (). » 6) 
("SAIL)| (FIND) - ('FIND) 
Loc oa . ; OBJ Loc 
(:BERING STRAIT) _ — a (:BERING STRAIT) (:BERING © 
= STAIT) 
P ke a, NONE NONE . 
(" THROUGH)” ; : : (> ) 
Figure 8. Scoring variability for BF cane. 2. Coles 1 on the left is the target 
Structure, » Scorers 1-3 are shown in columns 2-4 reepentively: on; 
‘ Ls ~~ 


. o ~~ [> . 3 


AERA ~ 24 


(:REGION) - ATT-> (‘ INTERIOR) ’ 

(:GROUP) - CAT-7 (: INDIAN) : 

« (ZINDIAN): - AGT=7 ("LIVE) ~ LOC@P('0N)-> (: coasr) 
2 / - LOC@P('IN)=> (:REGION) 


a ve FES 3 
rota) 8°! > Gave) 10° > Goast> 


5. 


; F J * , 
, Figure 9. a.) <A semantic network with b) its corresponding adjacency matrix. 


° 


